对表示形式的研究对于任何形式的交流都是至关重要的,我们有效利用它们的能力至关重要。本文介绍了一种新颖的理论 - 代表性系统理论 - 旨在从三个核心角度从三个核心角度进行抽象地编码各种表示:语法,综合及其属性。通过介绍建筑空间的概念,我们能够在一个统一的范式下编码这些核心组件中的每个核心组件。使用我们的代表性系统理论,有可能在结构上将一个系统中的表示形式转换为另一个系统的表示形式。我们结构转化技术的固有方面是根据表示的属性(例如它们的相对认知有效性或结构复杂性)的代表选择。提供一般结构转化技术的主要理论障碍是缺乏终止算法。代表系统理论允许在没有终止算法的情况下衍生部分变换。由于代表性系统理论提供了一种通用编码代表系统的通用方法,因此消除了进一步的关键障碍:需要设计特定于系统的结构转换算法,这是当不同系统采用不同的形式化方法时所必需的。因此,代表性系统理论是第一个提供统一方法来编码表示形式,通过结构转换支持表示形式的第一个通用框架,并具有广泛的实用应用。
translated by 谷歌翻译
The estimation of the generalization error of classifiers often relies on a validation set. Such a set is hardly available in few-shot learning scenarios, a highly disregarded shortcoming in the field. In these scenarios, it is common to rely on features extracted from pre-trained neural networks combined with distance-based classifiers such as nearest class mean. In this work, we introduce a Gaussian model of the feature distribution. By estimating the parameters of this model, we are able to predict the generalization error on new classification tasks with few samples. We observe that accurate distance estimates between class-conditional densities are the key to accurate estimates of the generalization performance. Therefore, we propose an unbiased estimator for these distances and integrate it in our numerical analysis. We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy in few-shot settings.
translated by 谷歌翻译
Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build artificial systems that can analogously plan actions on top of imagined images. To this end, we introduce Mental Imagery for Robotic Affordances (MIRA), an action reasoning framework that optimizes actions with novel-view synthesis and affordance prediction in the loop. Given a set of 2D RGB images, MIRA builds a consistent 3D scene representation, through which we synthesize novel orthographic views amenable to pixel-wise affordances prediction for action optimization. We illustrate how this optimization process enables us to generalize to unseen out-of-plane rotations for 6-DoF robotic manipulation tasks given a limited number of demonstrations, paving the way toward machines that autonomously learn to understand the world around them for planning actions.
translated by 谷歌翻译
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. In this work, we propose to capture each of these aspects by modeling the surgical scene with a disentangled latent scene graph representation, which we can then process using a graph neural network. Unlike previous approaches using graph representations, we explicitly encode in our graphs semantic information such as object locations and shapes, class probabilities and visual features. We also incorporate an auxiliary image reconstruction objective to help train the latent graph representations. We demonstrate the value of these components through comprehensive ablation studies and achieve state-of-the-art results for critical view of safety prediction across multiple experimental settings.
translated by 谷歌翻译
Nowadays, the applications of hydraulic systems are present in a wide variety of devices in both industrial and everyday environments. The implementation and usage of hydraulic systems have been well documented; however, today, this still faces a challenge, the integration of tools that allow more accurate information about the functioning and operation of these systems for proactive decision-making. In industrial applications, many sensors and methods exist to measure and determine the status of process variables (e.g., flow, pressure, force). Nevertheless, little has been done to have systems that can provide users with device-health information related to hydraulic devices integrated into the machinery. Implementing artificial intelligence (AI) technologies and machine learning (ML) models in hydraulic system components has been identified as a solution to the challenge many industries currently face: optimizing processes and carrying them out more safely and efficiently. This paper presents a solution for the characterization and estimation of anomalies in one of the most versatile and used devices in hydraulic systems, cylinders. AI and ML models were implemented to determine the current operating status of these hydraulic components and whether they are working correctly or if a failure mode or abnormal condition is present.
translated by 谷歌翻译
The aim of this work is to introduce MaRF, a novel framework able to synthesize the Martian environment using several collections of images from rover cameras. The idea is to generate a 3D scene of Mars' surface to address key challenges in planetary surface exploration such as: planetary geology, simulated navigation and shape analysis. Although there exist different methods to enable a 3D reconstruction of Mars' surface, they rely on classical computer graphics techniques that incur high amounts of computational resources during the reconstruction process, and have limitations with generalizing reconstructions to unseen scenes and adapting to new images coming from rover cameras. The proposed framework solves the aforementioned limitations by exploiting Neural Radiance Fields (NeRFs), a method that synthesize complex scenes by optimizing a continuous volumetric scene function using a sparse set of images. To speed up the learning process, we replaced the sparse set of rover images with their neural graphics primitives (NGPs), a set of vectors of fixed length that are learned to preserve the information of the original images in a significantly smaller size. In the experimental section, we demonstrate the environments created from actual Mars datasets captured by Curiosity rover, Perseverance rover and Ingenuity helicopter, all of which are available on the Planetary Data System (PDS).
translated by 谷歌翻译
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.
translated by 谷歌翻译
Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. Our experiments show that unsupervised ASR itself can improve the representations from speech self-supervised models. More importantly, it is shown as an efficient connector between speech and textual pre-trained models, improving the performances of five different SLU tasks. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.
translated by 谷歌翻译
We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.
translated by 谷歌翻译
最近对反向传播的近似(BP)减轻了BP的许多计算效率低下和与生物学的不兼容性,但仍然存在重要的局限性。此外,近似值显着降低了基准的准确性,这表明完全不同的方法可能更富有成果。在这里,基于在软冠军全网络中Hebbian学习的最新理论基础上,我们介绍了多层softhebb,即一种训练深神经网络的算法,没有任何反馈,目标或错误信号。结果,它通过避免重量传输,非本地可塑性,层更新的时间锁定,迭代平衡以及(自我)监督或其他反馈信号来实现效率,这在其他方法中是必不可少的。与最先进的生物学知识学习相比,它提高的效率和生物兼容性不能取得准确性的折衷,而是改善了准确性。 MNIST,CIFAR-10,STL-10和IMAGENET上最多五个隐藏层和添加的线性分类器,分别达到99.4%,80.3%,76.2%和27.3%。总之,SOFTHEBB显示出与BP的截然不同的方法,即对几层的深度学习在大脑中可能是合理的,并提高了生物学上的机器学习的准确性。
translated by 谷歌翻译